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Abstract 

We consider the problem of estimating high-dimensional Gaussian graphical models corresponding to a sin- 
gle set of variables under several distinct conditions. This problem is motivated by the task of recovering 
transcriptional regulatory networks on the basis of gene expression data. We assume that most aspects of 
the conditional dependence networks are shared, but that there are some structured differences between them. 
Rather than assuming that similarities and differences between networks are driven by individual edges, we 
take a node-based approach. We consider estimation under two distinct assumptions: (1) differences between 
the K networks are due to individual nodes that are perturbed across conditions, or (2) similarities among 
the K networks are due to the presence of common hub nodes that are shared across all K networks. Using a 
row-column overlap norm penalty function, we formulate two convex optimization problems that correspond 
to these two assumptions. We solve these problems using an alternating directions method of multipliers al- 
gorithm, and we derive a set of necessary and sufficient conditions that allows us to decompose the problem 
into independent subproblems so that our algorithm can be scaled to high-dimensional settings. Our proposal 
is illustrated on synthetic data and on a brain cancer gene expression data set. 

Keywords: graphical models, structured sparsity, alternating direction method of multipliers, gene regulatory 
networks, lasso, multivariate normal 



1. Introduction 



Graphical models encode the conditional dependence relationships among a set of p variables, or features 
( [Lauritzenl 1 1 996| l . They are a tool of growing importance in a number of fields, including finance, biology, 
and computer vision. A graphical model is often referred to as a conditional dependence network, or simply 
as a network. Motivated by network terminology, we can refer to the p variables in a graphical model as 
nodes. If a pair of variables (or features) are conditionally dependent, then there is an edge between the 
corresponding pair of nodes; otherwise, no edge is present. 

Suppose that we have n observations that are independently drawn from a multivariate normal distribution 
with covariance matrix E. Then the corresponding Gaussian graphical model (GGM) that describes the 
conditional dependence relations hips among the features is encoded b y the sparsity pattern of the inverse 



covariance matrix, E ' (see e.g. 'Mardia et al. 



1979 
1 



Lauritzen 



1996|. That is, the jth and j'th variables 
are conditionally independent if and only if (E^')^^' = 0. Unfortunately, when p ^ n or p ^ n, obtaining 
an accurate estimate of E^' is challenging. In such a scenario, we can use prior information — such as the 
knowledge that many of the pa irs of variables ar e condition ally independ ent — in order to more accurately 
estimate E^' (see e.g. Yuan and Lin , 2007a; Friedman et al.,,2007; Banerjee et al. 2008j . 

In this paper, we consider the task of estimating K GGMs on a single set of p variables under the assump- 
tion that the GGMs are similar, with certain structured differences. As a motivating example, suppose that we 
have access to gene expression measurements for «i lung cancer samples and «2 normal lung samples, and 



1 



that we would like to estimate the gene regulatory networks underlying the normal and cancer lung tissue. 
We can model each of these regulatory networks using a GGM. We have two obvious options. 



1. We can estimate a single network on the basis of all «i +«2 tissue samples. But this approach overlooks 
fundamental differences between the true lung cancer and normal gene regulatory networks. 

2. We can estimate separate networks based on the «i cancer and n2 normal samples. However, this 
approach fails to exploit substantial commonality of the two networks, such as lung-specific pathways. 

In order to effectively make use of the available data, we need a principled approach for jointly estimating 
the two networks in such a way that the two estimates are encouraged to be quite similar to each other, while 
allowing for certain structured differences. In fact, these differences may be of scientific interest. 

Another example of estimating multiple GGMs arises in the analysis of the conditional dependence re- 
lationships among p stocks at two distinct points in time. We might be interested in detecting stocks that 
have differential connectivity with all other stocks across the two time points, as these likely correspond to 
companies that have undergone significant changes. Yet another example occurs in the field of neuroscience, 
in which it is of interest to learn how the connectivity of neurons changes over time. 

Past work on joint estimation of multiple GGMs has assumed that individual edges are shared or differ 



across conditions (see e.g. Kolar et al. 2010 Zhang and Wang 2010 Guoetal. 2011 Danaheretal. 2012 1; 



here we refer to such approaches as edge-based. In this paper, we instead take a node-based approach: 
we seek to estimate K GGMs under the assumption that similarities and differences between networks are 
driven by individual nodes whose patterns of connectivity to other nodes are shared across networks, or differ 
between networks. As we will see, node-based learning is more powerful than edge-based learning, since it 
more fully exploits our prior assumptions about the similarities and differences between networks. 
More specifically, in this paper we consider two types of shared network structure. 

1. Certain nodes serve as highly-connected hub nodes. We assume that the same nodes serve as hubs in 
each of the K networks. Figure[T]illustrates a toy example of this setting, with p ^5 nodes and K — 2 
networks. In this example, the second variable, X2, serves as a hub node in each network. In the context 
of transcriptional regulatory networks, X2 might represent a gene that encodes a transcription factor 
that regulates a large number of downstream genes. We propose the common hub (co-hub) node joint 
graphical lasso (CNJGL), a convex optimization problem for estimating GGMs in this setting. 

2. The networks differ due to particular nodes that are perturbed across conditions, and therefore have 
a completely different connectivity pattern to other nodes in the K networks. Figure |2] displays a toy 
example, with p ^ 5 nodes and K — 2 networks; here we see that all of the network differences are 
driven by perturbation in the second variable, X2. In the context of transcriptional regulatory networks, 
X2 might represent a gene that is mutated in a particular condition, effectively disrupting its conditional 
dependence relationships with other genes. We propose the perturbed-node joint graphical lasso (PN- 
JGL), a convex optimization problem for estimating GGMs in this context. 

Node-based learning of multiple GGMs is challenging, due to complications resulting from symmetry of the 
precision matrices. In this paper, we overcome this problem through the use of a new convex regularizer 

The rest of this paper is organized as follows. We introduce some relevant background material in Section 
2. In Section 3, we present the row-column overlap norm (RCON), a regularizer that encourages a matrix 
to have a support that is the union of a set of rows and columns. We apply the RCON penalty to a pair of 
inverse covariance matrices, or to the difference between a pair of inverse covariance matrices, in order to 
obtain the CNJGL and PNJGL formulations just described. In Section 4, we propose an alternating direction 
method of multipliers (ADMM) algorithm in order to solve these two convex formulations. In order to 
scale this algorithm to problems with many features, in Section 5 we introduce a set of simple conditions 
on the regularization parameters that indicate that the problem can be broken down into many independent 
subproblems, leading to substantial algorithm speed-ups. In Section 6, we apply CNJGL and PNJGL to 
synthetic data, and to gene expression data. The Discussion is in Section 7. Proofs are in the Appendix. 
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(a) 



(b) 



Figure 1: Two networks share a common hub (co-hub) node. X2 serves as a hub node in both networks, (a): 
Network 1 and its adjacency matrix, (b): Network 2 and its adjacency matrix. 
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(a) (b) 

Figure 2: Two networks that differ due to node perturbation of X2- (a): Network 1 and its adjacency matrix. 

(b): Network 2 and its adjacency matrix, (c): Left: Edges that differ between the two networks. 
Right: Shaded cells indicate edges that differ between Networks 1 and 2. 



A preliminary version of some of the ideas in this paper appear in Mohan et al. ( 2012 1. There the PNJGL 
formulation was proposed, along with an ADMM algorithm. Here we expand upon that formulation and 
present the CNJGL formulation, an ADMM algorithm for solving it, as well as comprehensive results on 
both real and simulated data. Furthermore, in this paper we discuss theoretical conditions for computational 
speed-ups, which are critical to application of both PNJGL and CNJGL to data sets with many features. 



2. Background on High-Dimensional GGM Estimation 



2.1 The Graphical Lasso for Estimating a Single GGM 

As was mentioned in Section [T] estimating a single GGM on the basis of n independent and ide ntically 
distributed observations from a A^p(0, £) distribution amounts to learning the sparsity structure of E^' (Mardia 

When n > p, one can estimate E^' by maximum likelihood. But in high 



et al. 



1979 



Lauritzen 



19961 



dimensions when p is large relative to n, this is not possible because the empirical covariance matrix is 



singular. Consequently, a number of authors (among others, Yuan and Lin 2007a[ [Friedman et al. 2007 



Ravikumar et al. 2008 Banerjee et al. 2008 Scheinberg et al. 2010 Hsieh et aT] |201 1 1 have considered 



maximizing the penalized log likelihood 



maximize {log det ^ 



-trace(5©)-A.||©||i}, 



(1) 



where S is the empirical covariance matrix. A- is a nonnegative tuning parameter, §'J__|_ denotes the set of 
positive definite matrices of size p, and ||0|| 1 = \ Th^ solution to (jlj serves as an estimate of E~\ 
and a zero element in the solution corresponds to a pair of features that are estimated to be conditionally 
independent. Due to the ii penalty (Tibshirani 1996 1 in (fill, this estimate will be positive definite for any 

as the graphical lasso. Proble m ([T]l is convex. 



A, > 0, and sparse when X is sufficiently large. We refer to 1 



and efficient algorithms for solving it are available (among others, 'Friedman et al. 


20071 


Banerjee et al. 


|2008j|Rothmanetal. 2008 D' Aspremont et al. 2008j|Scheinberg et al. 


|2010,|Witten et al. 


201 l|l. 



2.2 The Joint Graphical Lasso for Estimating Multiple GGMs 

Several formulations have recently been proposed for extending the graphical lasso ([T]l to the setting in which 
one has access to a number of observations from K distinct conditions, each with measurements on the same 
set of p features. The goal is to estimate a graphical model for each condition under the assumption that the 
K networks share certain characteristics but are allowed to differ in certain structured ways. |Guo et alT ( 201 1 1 



take a non-convex approach to solving this problem. I Zhang and Wang| p010| ) take a convex approach, but 
use a least squares loss function rather than the negative Gaussian log likelihood. Here we review the convex 



formulation of Danaher et al. ( 2012 1, which forms the starting point for the proposal in this paper 

Suppose that X^,... G W are independent and identically distributed from a A^p(0,E'^) distribution, 
for ^ = 1, . . . Here «jt is the number of observations in the k\h condition, or class. Letting 5* denote the 
empirical covariance matrix for the ^th class, we can maximize the penalized log likelihood 

K 



maximize 



L(0i , . . . , 0^^) - ^ 110^11 1 - X2Y^P{@]j, ...,05), (2) 

0'GS';^....,0-GS';+ [ ^fi <w J 

where L(0V . . ,0^) = Lf=i "it (logdet0*^ — trace(5'^0'^)), A-i and A-2 are nonnegative tuning parameters, and 
P{@ij, • • . , 0^) is a convex penalty function applied to each off-diagonal element of 0' , ... , 0^ in order to en- 
courage similarity among them. Then the 0\ . . . , 0^ that solve ^ serve as estimates for (E' \ . . . , (TJ^)^^ . 
Danaher et al. (2012 1 refer to ([2]l as the joint graphical lasso (JGL). In particular, they consider the use of a 



fused lasso penalty ( Tibshirani et al.|[2005[ ). 



k<k' 



on the differences between pairs of network edges, as well as a group lasso penalty ( [Yuan and Lin|[20(37b| l, 



k=l 



on the edges themselves. |Danaher et al. ( 2012 1 refer to problem (|2]i combined with ([3]l as the fused graphical 



lasso (FGL), and to (|2]) combined with (|4]| as the group graphical lasso (GGL). FGL encourages the K 
network estimates to have identical edge values, whereas GGL encourages the K network estimates to have 
a shared pattern of sparsity. Both the FGL and GGL optimization problems are convex. An approach related 



to FGL and GGL is proposed in |Hara and Washio ( 2013) 1. 



Because FGL and GGL borrow strength across all available observations in estimating each network, they 
can lead to much more accurate inference than simply learning each of the K networks separately. But both 
FGL and GGL take an edge-based approach: they assume that differences between and similarities among 
the networks arise from individual edges. In this paper, we propose a node-based formulation that allows for 
more powerful estimation of multiple GGMs, under the assumption that network similarities and differences 
arise from nodes whose connectivity patterns to other nodes are shared or disrupted across conditions. 



3. Node-Based Joint Graphical Lasso 

In this section, we first discuss the failure of a naive approach for node-based learning of multiple GGMs. 
We then present a norm that will play a critical role in our formulations for this task. Finally, we discuss two 
approaches for node-based learning of multiple GGMs. 

3.1 Why is Node-Based Learning Challenging? 

At first glance, node-based learning of multiple GGMs seems straightforward. For instance, consider the task 
of estimating K — 2 networks under the assumption that the connectivity patterns of individual nodes differ 
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(a) Naive group lasso (b) RCON: £i /(i (c) RCON: (d) RCON: £i /£, 



Figure 3: Toy example of the results from applying various penalties in order to estimate a 5 x 5 matrix, under 
a symmetry constraint. Zero elements are shown in white; non-zero elements are shown in shades 
of red (positive elements) and blue (negative elements), (a): The naive group lasso applied to the 
columns of the matrix yields non-zero elements that are the intersection, rather than the union, of 
a set of rows and columns, (b): The RCON penalty using an £i/£i norm results in unstructured 
sparsity in the estimated matrix, (c): The RCON penalty using an £i/£2 norm results in entire rows 
and columns of non-zero elements, (d): The RCON penalty using an £i/£oo norm results in entke 
rows and columns of non-zero elements; many take on a single maximal (absolute) value. 



across the networks. It seems that we could simply modify ([2]) combined with ([3]) as follows. 



maximize <^ L(0i , O^) - ||©i || ; - A-i ||02|| j - A.2 ||0) - 0]| 



(5) 



where 0^' is the yth column of the matrix 0*. This amounts to applying a group lasso (Yuan and Lin 2007b I 
penalty to the columns of 0' — 0^. Equation (jsjl seems to accomplish our goal of encouraging 0j = @j- We 
will refer to this as the naive group lasso approach. 

In (jsj, we have applied the group lasso using p groups; the jth group is the jth column of 0' — 0^. Due to 
the symmetry of 0' and 0^, there is substantial overlap among the p groups: the (/, j)th element of 0' — 0^ is 
contained in both the rth and jth groups. In the presence of overlapping groups, the group lasso penalty yields 
estimates whose support is the complement of the union of groups (Jacob et aL]|2009[|Obozinski et al.[|201 l| l. 
Figure [sj a) displays a simple example of the results obtained if we attempt to estimate (S')^' — (^■^) using 
(j5]l. The figure reveals that (|5]l cannot be used to detect node perturbation. 

A naive approach to co-hub detection is challenging for a similar reason. Recall that the jth node is a co- 
hub if the yth columns of both 0' and 0^ contain predominantly non-zero elements, and let diag(0) denote 
a matrix consisting of the diagonal elements of 0. It is tempting to formulate the optimization problem 



maximize 

0'gS'|, ,0-gS'|, 



L(0',02)-Xi||0il-Xi||02||i-X2£ 



0' -diag(0') 
02-diag(02) 



(6) 



where the group lasso penalty encourages the off-diagonal elements of many of the columns to be simulta- 
neously zero in 0' and 0^. Unfortunately, once again, the presence of overlapping groups encourages the 
support of the matrices 0' and 0^ to be the intersection of a set of rows and columns, as in Figurejsja), rather 
than the union of a set of rows and columns. 



3.2 Row-Column Overlap Norm 

Detection of perturbed nodes or co-hub nodes requires a penalty function that, when applied to a matrix, 
yields a support given by the union of a set of rows and columns. We now propose the row-column overlap 
norm (RCON) for this task. 



5 



Definition 1 The row-column overlap norm (RCON) induced by a matrix norm \\ . || is defined as 

n(0i,02,...,0^) = 



mm 

v',y^,...,v'' 



y2 



(7) 



subject to 0*^ = y*-' + {V'^y foik^l,...,K. 
It is easy to check that RCON satisfies the following properties. 

Property 1 i2 is indeed a norm for all matrix norms ||.||. Consequently, it is convex. 

Property 2 When ||.|| is symmetric in its argument, i.e., ||y|| = ||y^||, then 

n(0l,02,...,0^)-i 



01 

02 



0« 



(8) 



In this paper, we are interested in the particular class of RCON penalties where ||.|| is an £i/iq norm, 
p 

given by ||y || = ^ ll^ill?' where 1 < ^ < °°. With a little abuse of notation, we will let Q.g denote £2 when 

.7=1 

||.|| is given by the ii/tg norm. Property 2 indicates that Q.i{@\@^ , . . . ,@^) = j if^i Lj |0f,l- We note 
that £22 is a variant of the overlap norm ( Obozinski et al. 201 1[ [Jacob et aT] |2009| ) applied to groups given 
by rows and columns of 0\ . . . , 0^. Details of this relationship are described in Appendix|E] 

In Figure [3] we display schematic results obtained from estimating a 5 x 5 matrix subject to the RCON 
penalty £2^, for q — I, 2, and 00. We see from Figure|3jb) that when ^ = 1, then the RCON penalty yields a 
matrix estimate with unstructured sparsity; recall that £2i amounts to an £1 penalty applied to the elements of 
a matrix. When ^ = 2 or ^ = 00, we see from Figures [3];c)-(d) that the RCON penalty yields a sparse matrix 
estimate for which the non-zero elements are the union of a set of rows and columns. Additional properties 
of RCON are discussed in Appendix [A| 



3.3 Node-Based Approaches for Learning GGMs 

We now discuss two approaches for node-based learning of GGMs. The first promotes networks whose 
differences are attributable to perturbed nodes. The second encourages the networks to share co-hub nodes. 



3.3.1 Perturbed-node joint graphical lasso 

Consider the task of jointly estimating K precision matrices by solving 

maximize J L(0i , 0^, . . . ,0^) - A-i f; ||0'^|| 1 - A.2 F a,(0'' - 0*') I . (9) 
©'■0' [ k=i k<k' J 

We refer to the convex optimization problem (j9]l as the perturbed-node joint graphical lasso (PNJGL). Let 
©',0^,...,©'^ denote the solution to ([ij; these serve as estimates for (E^)"' , (E^)"\ . . . , (E^)"'. In (jijl, 
A,i and X2 are nonnegative tuning parameters, and ^ > 1 ■ When A-2 = 0, (|9]l amounts simply to applying 
the graphical lasso optimization problem ([T]i to each condition separately in order to separately estimate K 
networks. When A,2 > 0, we are encouraging similarity among the K network estimates. Property 2 of the 
RCON penalty leads to the following observation. 

Remark 2 The FGL formulation (Equations^and^ is a special case of PNJGL Q with q — I. 
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In other words, when ^ = 1, (|9]l amounts to the edge-based approach of Danaher et al. (2012 1 that encourages 



many entries of © — to equal zero. However, when q = 2 ox q = °°, then d9b amounts to a node-based 

approach: the support of — is encouraged to be a union of a few rows and the corresponding columns. 
These can be interpreted as a set of nodes that are perturbed across the conditions. An example of the sparsity 
structure detected by PNJGL with g' = 2 or ^ = oo is shown in Figure |2j 

3.3.2 Co-hub node joint graphical lasso 

We now consider jointly estimating K precision matrices by solving the convex optimization problem 

maximize i L(0' , 0^, . . . ,0^) - A-i V H©*^!! i - A.2n^(0' - diag(0^), . . . ,0^^ - diag(0^)) j. . (10) 

@',&^,...,@^eS'' 



k=l 



We refer to (10 1 as the co-hub node joint graphical lasso (CNJGL) formulation. In (10 1, Xi and A-2 are 
nonnegative tuning parameters, and q> 1. When X2—Q then this amounts to a graphical lasso optimization 
problem applied to each network separately; however, when A-2 > 0, a shared structure is encouraged among 
the K networks. In particular, {TU[ encourages network estimates that have a common set of hub nodes — that 

a2 



is, it encourages the supports of , , . . . , to be the same, and the union of a set of rows and columns. 

CNJGL can be interpreted as a node-based extension of the GGL proposal (given in Equations |2] and |4] 
and originally proposed by Danaher et al. 2012 1. While GGL encourages the K networks to share a common 
edge support, CNJGL instead encourages the networks to share a common node support. 

We now remark on an additional connection between CNJGL and the graphical lasso. 

Remark 3 If q = 1, then by Property 2 of the RCON penalty, CNJGL amounts to a modified graphical lasso 
on each network separately, with a penalty ofXi applied to the diagonal elements, and a penalty ofXi + X2/2 
applied to the off-diagonal elements. 



4. Algorithms 



The PNJGL and CNJGL optimization probl ems ([9] [T0| are convex, and so can be dkectly solved in the 
modeling environment cvx (Grant and Boyd 20 10] l7 which calls conic interior-point solvers such as SeDuMi 
or SDPT3. However, when applied to solve semi-definite programs, second-order methods such as the interior- 
point algorithm do not scale well with the problem size. 

We next examine the use of existing first-order methods to solve (j9|l and ( 10 1. Several first-order algo- 
rithms have been proposed for minimizing a least squares objective with a group lasso penalty (as in"^ 
and Lin ,2007b) in the presence of overlapping groups ( Argyriou et al. , 20 1 1 Chen et al. , 201 1 , Mosci 1 
2010 1. Unfortunately, those algorithms cannot be applied to the PNJGL and CNJGL formulations, which 
involve the RCON penalty rather than simply a standard group lasso with overlapping groups. The RCON 
penalty is a variant of the overlap norm proposed in Obozinski et al. ( 201 l| l, and indeed those authors pro- 
pose an algorithm for minimizing a least squares objective subject to their overlap norm. However, in the 
context of CNJGL and PNJGL, the objective of interest is a Gaussian log likelihood; therefore, the algorithm 



1 Yuan] 
1 et al.] 



of |Obozinski et al. (201 1 1 cannot be easily applied. 



Another possible approach for solving (j9|l and ( 10 1 involves the use of a standard first-order method, such 
as a projected subgradient approach. Unfortunately, such an approach is not straightforward, since computing 
the subgradients of the RCON penalty involves solving a non-trivial optimization problem (to be discussed in 
greater detail in Appendix [A). Similarly, a proximal gradient approach for solving (|9]) and ( [TQ| is challenging 
because the proximal operator of the combination of the overlap norm and the £1 norm has no closed form. 
To overcome the challenges outlined above, we propose to solve the PNJGL and CNJGL formulations 



using an alternating directions method of multipliers algorithm (ADMM; see e.g. Boyd et al. 2010 1. 
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4.1 The ADMM Approach 

Here we briefly outline the ADMM approach for a general optimization problem, 

minimize e(X) + h(X) 
subject to X £ X. 

ADMM is attractive in cases where the proximal operator of g{X) + h{X) cannot be easily computed, but 
where the proximal operator of g{X) and the proximal operator of h{X) are easily obtained. The approach is 
as follows ( |Boyd et al.|[20T0HEckstein and Berts"ekasl[T992l[Gabay and Mercier| [1976) 1: 



(12) 



1 . Rewrite the optimization problem ( [TT] i as 

minimize g{X) + h{Y) 
subject to X e X, X ^Y, 

where here we have decoupled g and h by introducing a new optimization variable, Y. 

2. Form the augmented Lagrangian to ( [T2] i by first forming the Lagrangian, 

LiX,Y,A) = g{X)+h{Y) + {A,X - Y), 
and then augmenting it by a quadratic function of X ~Y, 

Lp{X,Y,A)^LiX,Y,A) + ^\\X-Y\\l, 

where p is a positive constant. 

3. Iterate until convergence: 

(a) Update each primal variable in turn by minimizing the augmented Lagrangian with respect to that 
variable, while keeping all other variables fixed. The updates in the kth iteration are as follows: 

^ argminL(,(X,}'^A'^) (13) 

^ iiigmmLp{x''+\Y,A'') (14) 

(b) Update the dual variable using a dual-ascent update, 

^k+i ^ A* + p(z*^+i-y'^+i). (15) 

4.2 ADMM Algorithms for PNJGL and CNJGL 

We assume that ||.|| is an li/iy norm. Here we outUne the ADMM algorithms for the PNJGL and CNJGL 
optimization problems; we refer the reader to Appendix |F| for detailed derivations of the update rules. 

4.2. 1 ADMM Algorithm for PNJGL 

Here we consider solving PNJGL with K = 2; the case K > 2 is a straightforward extension. To begin, we 
note that ^ can be rewritten as 

maximize \ L{@\@^) -X2t\\V ^f.^ 
subject to ©'-©2=y + y^. 
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We now reformulate ( [T6| ) by introducing new variables, so as to decouple some of the terms in the objective 
function that are difficult to optimize jointly: 

minimize \ -L{&\@^) +Xi\\Z'\\i +Xi\\Z% +l2y\\V i\\A 

0'G5^+,Q2e5P_^,zi,z2y,w [ jri J (17) 

subjectto 0' -02 = y +w,y = w^,©' =z\&^=z^. 



The augmented Lagrangian to ( 17 1 is given by 



- L(0',02)+Xi||zi||i+Xi||z2||i+X2£||y,||, + (F,0'-02-(y+w)) 

7=1 

+ (G,y-lvO + (e^0'-z'> + (e^0^-z^) + fl|0'-0^-(v+w)||2 
+ Uv-w^', + ^&'-z% + l\\&^-zm. 



(18) 



In ( |T8] l there are six primal variables and four dual variables. Based on this augmented Lagrangian, the 
complete ADMM algorithm for (j9|l is given in Algorithm [T] in which the operator Expand is given by 



Expand(A,p,n^) = argmin{-n^logdet(0) + p|10-A|l^} ^ i D+ J + , 

2 V V P / 

where UDU^ is the eigenvalue decomposition of a symmetric matrix A, and as mentioned earlier, rik is the 
number of observations in the Mi class. The operator 1^ is given by 



■r,(A,X) = argmin I -A||2 +X f 



and is also known as the proximal operator corresponding to the li/ic/ norm. For q = 1,2,°°, "Tg takes a 
simple form (see e.g. Section 5 of Duchi and Singer| |2009 ). 

Algorithm 1: ADMM algorithm for the PNJGL optimization problem ^ 

input: p>0,^> 1,W>0; 

Initialize: Primal variables to the identity matrix and dual variables to the zero matrix; 
for t = 7.fmax do 

while Not converged do 

0i ^Expand (i(02+y + W+Zi) -^(ei+«i5i+F),p,«i) ; 



02^ Expand (i(0i-(y+W)+z2)-2L(e2+„252-F),p,«2); 



Z-^ri{& + f,f] for/=l,2; 



P ' P 

y^T,(i(W^-W + (0'-02)) + i,(F-G),|); 

i(y^-y + (0i-02)) + ^(F + G^); 
F ^F + p{&^ -&^-{v+w)y, 
G^G+p{v -w'^y, 

Q' ^Q' + P (0' - Z') for / = 1 , 2 
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4.2.2 ADMM Algorithm FOR CNJGL 



The CNJGL formulation in ( 10 1 is equivalent to 



minimize 

©' ,©2 0*^63'' .V ,v2 y*^GM;'x/' 



subject to 



-L(0S02,...,0^)+xi£ii0'iii+x2i: 
'■=1 )=i 



0'-diag(0') = V' + (V') foii^l,...,K. 

















> 














j 





(19) 



One can easily see that the problem ( 19 1 is equivalent to the problem 



maximize 



-L(0',02,...,0'^)+Xi£||0'||i+X2£ 

!=1 .7=1 



y' - diag(yi) 

y2-diag(y2) 
y'^-diag(y^) 



subject to 



& ^V' + {Vy for 1,2,...,/:, 



in the sense that the optimal solution {V} to ( 19 1 and the optimal solution {V'} to (20i have the following 
relationship: V — V' — diag(y') for ; = 1,2, .. . ,K. We now present an ADMM algorithm for solving (20i. 
We reformulate ( 20 1 by introducing additional variables in order to decouple some terms of the objective that 
are difficult to optimize jointly: 



minimize 

©'gS'|_i_,Z',V',W'gKpx'' 



subject to 



f «;(-logdet(0') +trace(5'0')) +Xi £ ||Z'||i +X2 f 
i=i i=i j=i 



y' -diag(yi) 
y2-diag(y2) 



y^-diag(y^) 

0' = V' + W, V" = (Wy , 0' = Z' for / = 1 ,2, . . . 



The augmented Lagrangian to (21 1 is given by 



K K p 

£n,(-logdet(0') +trace(5'0')) +Xi £ HZ'Hi +X2 £ 
i=i i=i j=i 



y'-diag(yi) 
y2-diag(y2) 



y'^-diag(y^) 

£ { {F\& - {r + w')> + {G\r - {w'f) + {Q\& - z') } + 



K 

I 

i=\ 



(22) 



e f {||0' - (y' + + ||y' - + 1|0' -z'||2 } , 
1=1 
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The corresponding ADMM algorithm is given in Algorithm|2] 



Algorithm 2: ADMM algorithm for the CNJGL optimization problem ( [T0| 



input: p>0,/j> > 0; 

Initialize: Primal variables to the identity matrix and dual variables to the zero matrix; 
for t = Ltmioi do 

whUe Not converged do 

©'■ ^ Expand ( \ (V ' + W' + Z') - ^ (fi' + niS' +F'),(),n^ for / = 1 , . . . , /T; 

Z'VTife' + f for/=l, 



I - (2' + ni5' + ) , p , n, j for / = 1 , 



Let C = 




T 


-W' + &) + j^{F'-G)fox 


i=l,...,K- 


r 1 




/ 


C^-diag(C^) 






- diag(C') " 




^% 




C2-diag(c2) 


'2p 


+ 


diag(c2) 






v 


_ C'^-diag(C'^) _ 






. diag(C^) _ 



G' ^ G 
Q' ^ Q 



-V'+&) 



' (F' + (G'y) for/ 



2p 



1,...,^:; 



+p(0'-(y'+w)) for/ = 1,...,/:; 

+ p(y'-(W'/)for/=l,. 
+ p(©'-Z') foTi^l,...,K 



4.2.3 Numerical Issues and Run-Time of the ADMM Algorithms 

We set /J = 5, p = 0.5 and fmax — 1000 in the PNJGL and CNJGL algorithms. In our implementation of these 
algorithms, the stopping criterion for the inner loop (corresponding to a fixed p) is 

max <^ ^-^^ . „ ' } < e, (23) 

where denotes the estimate of 0' in the kth iteration of the ADMM algorithm, and e is a tolerance that 

is chosen in our experiments to equal 10^^. 

The per-iteration complexity of the ADMM algorithms for CNJGL and PNJGL (with K ^ 2) is 0{p^); 
this is the complexity of computing the S VD. On the other hand, the complexity of an interior point method is 
0{p^). In a small example with p — 30, run on an Intel Xeon X3430 2.4Ghz CPU, the interior point method 
(using cvx, which calls Sedumi) takes 7 minutes to run, while the ADMM algorithm for PNJGL, coded 

in Matlab, takes only 0.58 seconds. When p = 50, the times are 3.5 hours and 2.1 seconds, respectively. 

' 1 ' 2 - 1 - 2 

Let , and , denote the solutions obtained by ADMM and cvx, respectively. We observe that on 
average, the error max \ 110' — 0'|If/||0'||f > is on the order of 10^^. 

(G{L2} I "J 

We now present more extensive timing results for the ADMM algorithms for PNJGL and CNJGL. We 
ran experiments with p = 100,200,500 and with n\ — n2— p/2. We generated synthetic data as described 
in Section |6] Results are displayed in Figures |4|a)-(d), where the panels depict the run-time and number 
of iterations required for convergence, as a function of A-i, and with A,2 fixed. The number of iterations 
required for convergence is computed as the total number of inner loop iterations performed in Algorithms [T] 
and|2] From Figures |4|b) and (d), we observe that as p increases from 100 to 500, the run-times increase 
substantially, but never exceed several minutes. 

Figure |4|a) indicates that for CNJGL, the total number of iterations required for convergence is small 
when A-i is small. In contrast, for PNJGL, Figure Qc) indicates that the total number of iterations is large 
when X\ is small. This phenomenon results from the use of the identity matrix to initialize the network 
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estimates in the ADMM algorithms: when A-i is small, the identity is a poor initialization for PNJGL, but a 
good initialization for CNJGL (since for CNJGL, A,? induces sparsity even when Xi = 0). 




Figure 4: (a): The total number of iterations for the CNJGL algorithm, as a function of A-i. (b): Run-time 
(in seconds) of the CNJGL algorithm, as a function of Xi . (c)-(d): As in (a)-(b), but for the PNJGL 
algorithm. All results are averaged over 20 random generations of synthetic data. 



4.2.4 Convergence of the ADMM Algorithm 



Problem (12 1 involves two groups of primal variables, X and Y; in this setting, convergence of ADMM 



has been established (see e.g. |Boyd et al.] |2010[ |Mota et al-l |2011[ ). However, the PNJGL and CNJGL 
optimization problems involve more than two groups of primal variables, and convergence of ADMM in this 
setting is an ongoing area of research. |Han and Yuan| ( [20T2| and |Hong and Luo] ( |2012| l show convergence of 
ADMM with more than two groups of variables under assumptions that do not hold for CNJGL and PNJGL. 
Under very minimal assumptions. He et al. ( 2012[ l proved that a modified ADMM algorithm (with Gauss- 
Siedel updates) converges to the optimal solution for problems with any number of groups. A more general 
set of conditions for convergence of the ADMM algorithm with more than two groups is left as a topic for 
future work. We also leave to future work a reformulation of the CNJGL and PNJGL problems as consensus 
problems, for which an ADMM algorithm involving two groups of primal variables can be derived, and for 



which convergence is therefore guaranteed (as in Ma et al. 2013 i. 



5. Algorithm-Independent Computational Speed-Ups 

The ADMM algorithms presented in the previous section work well on problems of moderate size. In order to 
solve the PNJGL or CNJGL optimization problems when the number of variables is large, a faster approach is 
needed. We now describe conditions under which any algorithm for solving the PNJGL or CNJGL problems 
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can be sped up substantially, for an appropriate range of tuning parameter values. Our approach mirrors 
previous results for the graphical lasso (iWitten et al. 201 l[ |Mazumder and Hastie 2012 1, and FGL and GGL 



(Danaher et al. 2012). The idea is simple: if the solutions to the PNJGL or CNJGL optimization problem are 



block-diagonal (up to some permutation of the features) with shared support, then we can obtain the global 
solution to the PNJGL or CNJGL optimization problem by solving the PNJGL or CNJGL problem separately 
on the features within each block. This can lead to massive speed-ups. For instance, if the solutions are block- 
diagonal with L blocks of equal size, then the complexity of our ADMM algorithm reduces from 0{p^) per 
iteration, to 0{{p/Vf') per iteration in each of L independent subproblems. Of course, this hinges upon 
knowing that the PNJGL or CNJGL solutions are block-diagonal, and knowing the partition of the features 
into blocks. 

In Sections |5.1[|5.3| we derive necessary and sufficient conditions for the solutions to the PNJGL and 
CNJGL problems to be block-diagonal. Our conditions depend only on the sample covariance matrices 
^...,S'^ and regularization parameters A-i , A,2. These conditions can be applied in at most 0{p^) operations. 
In Section 5.4 we demonstrate the speed-ups that can result from applying these sufficient conditions. 

We now introduce some notation that will be used throughout this section. Let (/i ,/2, . . . be a partition 
of the index set {1,2, . . . ,/:>}, and let T = [jf^i{Ii x /,}. Define the support of a matrix 0, denoted by supp(0), 
as the set of indices of the non-zero entries in 0. We say is supported on T if supp(0) C T . Note that any 
matrix supported on T is block-diagonal subject to some permutation of its rows and columns. Let \ T\ denote 
the cardinality of the set T. The scheme is displayed in Figure |5] In what follows we use an £i/" 
the RCON penalty, with ^ > 1, and let ^ + ^ = 1. 



Figure 5: A p x p matrix is displayed, for which /i,/2,/3 denote a partition of the index set {1,2, . . .,p}. 
T = X /,} is shown in red, and T'^ is shown in gray. 

5.1 Conditions for PNJGL Formulation to Have Block-Diagonal Solutions 

In this section, we give necessary conditions and sufficient conditions on the regularization parameters A-i , A-2 
in the PNJGL problem ^ so that the resulting precision matrix estimates ©',..., 0^ have a shared block- 
diagonal structure (up to a permutation of the features). 

We first present a necessary condition for and 0^ that minimize ^ with = 2 to be block-diagonal. 

Theorem 4 Suppose that the matrices 0^ and 0^ that minimize with K ~ 2 have support T. Then, if 
q > I, it must hold that 

riklS'ljl < Xi +X2/2 y{ij) G r\ forfc 1,2, and (24) 

\niSl + n2Sjj\ < 2X1 V(/,;) G T'. (25) 

Furthermore, if q > 1, then it must additionally hold that 

At E l^f/l <^i + ^ ^iAV , forfc=l,2. (26) 
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Remark 5 If\T'^\ ~ 0{p'') with r> I, then as p °°, {26\ simplifies to ^L(i-,;)Gr^ I'^f,! < ^i- 

We now present a sufficient condition for 0\ . . . , ©''^ that minimize to be block-diagonal. 

Theorem 6 For q > 1, a sufficient condition for the matrices 
support T is that 



. , © that minimize i9l to each have 



nk\SU<h VO^jOeT^'', fork=l,...,K. 



Furthermore, if q = 1 and K — 2, then the necessary conditions {24 \ and (251 are also sufficient. 

When q= \ and K ^2, then the necessary and sufficient conditions in Theorems|4]and[6]are identical, as was 
previously reported in |Danaher et al.| ( [2012| l. In contrast, there is a gap between the necessary and sufficient 
conditions in Theorems |4|and |6| when q> I and ^2 > 0. Wlien ^2 = 0, the necessary and sufficient conditions 
in Theorems |4]and|6]reduce to the results laid out in Witten et al. ( 2011[ l for the graphical lasso. 

5.2 Conditions for CNJGL Formulation to Have Block-Diagonal Solutions 

In this section, we give necessary and sufficient conditions on the regularization parameters A,i,A,2 in the 
CNJGL optimization problem (10 1 so that the resulting precision matrix estimates ©',..., ©^ have a shared 
block-diagonal structure (up to a permutation of the features). 



/V } /V 2 ^ K 

Theorem 7 Suppose that the matrices ,0 , . . . ,0 that minimize [lO) have support T. Then, if q> 1, it 
must hold that 



nk\S'lj\<'kx+'k2/2 V{i,J)eT', fork^l, 



Furthermore, if q > 1, then it must additionally hold that 



, for^= 



(27) 



(28) 



Remarks If\T''\ =0{p'') withr> 1, thenas p ^ {28 \ simplifies to ^ L(i,;)GrH'^f,l < -^i- 



1 2 AT 

We now present a sufficient condition for ,0 , . . . , that minimize ( 10 1 to be block-diagonal. 



Theorem 9 A sufficient condition for ,0 , . . . , that minimize [lO) to have support T is that 

nk\S'lj\<'ki V(/,;)er, forfc=l,...,/^. 

As was the case for the PNJGL formulation, there is a gap between the necessary and sufficient conditions 
for the estimated precision matrices from the CNJGL formulation to have a common block-diagonal support. 



5.3 General Sufficient Conditions 

In this section, we give sufficient conditions for the solution to a general class of optimization problems that 
include FGL, PNJGL, and CNJGL as special cases to be block-diagonal. Consider the optimization problem 



minimize J V ni(-logdet(0*^) + (0^5'^)) + V 110*^11 1 +>i2/i(0\ 0^) 
0' l-t=i 



(29) 



k=\ 



Once again, let T be the support of a px p block-diagonal matrix. Let 07- denote the restriction of any px p 



matrix to T; that is, {&T)ij 



I 0,7 if {iJ)&T 
[0 else 

/!(0l,...,0^)>/2(0^,...,0^) 



Assume that the function h satisfies 



for any matrices 0\ . . . , 0^ whose support strictly contains U. 



14 



Theorem 10 A sufficient condition for the matrices ©',..., that solve [29) to have support T is that 

nk\S'tj\<'ki V(/,j)er, fork^\,...,K. 

Note that this sufficient condition appHes to a broad class of regularizers h; indeed, the sufficient conditions 
for PNJGL and CNJGL given in Theorems |6] and l9] are special cases of Theorem [TO] In contrast, the necessary 
conditions for PNJGL and CNJGL in Theorems |4] and [7] exploit the specific structure of the RCON penalty. 

5.4 Evaluation of Speed-Ups on Synthetic Data 

Theorems |6] and |9]provide sufficient conditions for the precision matrix estimates from PNJGL or CNJGL to 
be block-diagonal with a given support. How can these be used in order to obtain computational speed-ups? 
We construct a px p matrix A with elements 

fl if/= /orif > A-i 
Aij=l ' ^ (30) 

' yd else 

We can then check, in 0{p^) operations, whether A is (subject to some permutation of the rows and columns) 
block-diagonal, and can also determine the partition of the rows and columns corresponding to the blocks 
(see e.g. 'TarjanJ 1972| l. Then, by Theorems [6] and [9] we can conclude that the PNJGL or CNJGL estimates 



are block-diagonal, with the same partition of the features into blocks. Inspection of the PNJGL and CN- 
JGL optimization problems reveals that we can then solve the problems on the features within each block 
separately, in order to obtain the global solution to the original PNJGL or CNJGL optimization problems. 

We now investigate the speed-ups that result from applying this approach. We consider the problem of 
estimating two networks of size p = 500. We create two inverse covariance matrices that are block diagonal 
with two equally-sized blocks, and sparse within each block. We then generate n\ = 250 observations from a 
multivariate normal distribution with the first covariance matrix, and n2 = 250 observations from a multivari- 
ate normal distribution with the second covariance matrix. These observations are used to generate sample 
covariance matrices and S^. We then performed CNJGL and PNJGL with X2 = 1 and a range of A-i values, 
with and without the computational speed-ups just described. 

Figure |6] displays the performance of the CNJGL and PNJGL formulations, averaged over 20 data sets 
generated in this way. In each panel, the x-axis shows the number of blocks into which the optimization 
problems were decomposed using the sufficient conditions; note that this is a surrogate for the value of A-i 
in the CNJGL or PNJGL optimization problems. Figure |6|a) displays the ratio of the run-time taken by the 
ADMM algorithm when exploiting the sufficient conditions to the run-time when not using the sufficient 
conditions. Figure|6|b) displays the true-positive ratio - that is, the ratio of the number of true positive edges 
in the precision matrix estimates to the total number of edges in the precision matrix estimates. Figure |6|c) 
displays the total number of true positives for the CNJGL and PNJGL estimates. Figure [6] indicates that the 
sufficient conditions detailed in this section lead to substantial computational improvements. 

6. Numerical Experiments 

In this section we illustrate the performance of PNJGL and CNJGL on both synthetic and real data. 

6.1 Simulation Study 

6.1.1 Data Generation 

We generated two synthetic networks, each of which contains a common set of p nodes. Two of these p nodes 
are perturbed across the two networks, and two are co-hub nodes that are identical across the two networks. 
In greater detail, we first created a px p symmetric matrix A with elements generated according to 



A 



ij ~i.i.d. 



with probabiUty 0.98, 

Unif([-0.6,-0.3]U [0.3,0.6]) otherwise. 
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Number of blocks Number of blocks Number of blocks 

(a) (b) (c) 

Figure 6: Speed-ups for CNJGL and PNJGL on a simulation set-up with p — 500 and «i = n2 = 250. The 
true inverse covariance matrices are block-diagonal with two equally-sized sparse blocks. The x- 
axis in each panel displays the number of blocks into which the CNJGL or PNJGL problems are 
decomposed using the sufficient conditions; this is a surrogate for A,i. The y-axes display (a): the 
ratio of run-times with and without the sufficient conditions; (b): the true positive ratio of the edges 
estimated; and (c): the total number of true positive edges estimated. 



We then duplicated A into two matrices. A' and A^. We selected two nodes at random, and for each node, we 
set the elements of the corresponding row and column of either A' or A^ (chosen at random) to be i.i.d. draws 
from a Unif([— 0.6, —0.3] U [0.3,0.6]) distribution. Thus, these two nodes are perturbed between A' and A^. 
Next, we randomly selected two nodes to serve as co-hub nodes, and set each element of the corresponding 
rows and columns in each network to be i.i.d. draws from a Unif([— 0.6, — 0.3] U [0.3,0.6]) distribution. In 
other words, these co-hub nodes are identical across the two networks. We set A J, = A ;,■ and A? = A^,, to 
ensure symmetry. In order to make the matrices positive definite, we let c — min(A,min(A'), A-min(A^)), where 
^mm(') indicates the smallest eigenvalue of the matrix. We then set (E^)^' equal to A' + (0.1 + |c|)/ and set 
(E^)^^ equal to A^ + (0.1 + |c|)/, where here / is the p >i p identity matrix. Then and T? were used as the 
covariance matrices in the simulation study. 

Next, we generated n independent observations each from a A^(0, E' ) and a A^(0, E^) distribution, and used 
them to compute the sample covariance matrices and S^. We used p = 100, and n e {25,50, 100,200}. 

6.1.2 Simulation Results 

We now define several metrics used to measure algorithm performance. We wish to quantify each algorithm's 
(1) recovery of the support of the true inverse covariance matrices, (2) successful detection of co-hub and 
perturbed nodes, and (3) error in estimation of — (E')^' and = (E2)^^ Details are given in Tablejl] 
These metrics are discussed further in Appendix [G| 

We compared the performance of PNJGL to its edge-based counterpart FGL, as well as to graphical lasso 
(GL). We compared the performance of CNJGL to GGL and GL. We expect CNJGL to be able to detect 
co-hub nodes (and, to a lesser extent, perturbed nodes), and we expect PNJGL to be able to detect perturbed 
nodes. (The co-hub nodes will not be detected by PNJGL, since they are identical across the networks.) 

The simulation results are displayed in Figures [7] and |8] Each row corresponds to a sample size while 
each column corresponds to a performance metric. In Figure |7] PNJGL, FGL, and GL are compared, and in 
Figure [8] CNJGL, GGL, and GL are compared. Within each plot, each colored line corresponds to the results 
obtained using a fixed value of X2 (for either PNJGL, FGL, CNJGL, or GGL), as A-i is varied. Recall that GL 
corresponds to any of these four approaches with X2 — 0. Note that the number of positive edges (defined in 
Table [TJ decreases approximately monotonically with the regularization parameter A-i, and so on the x-axis 
we plot the number of positive edges, rather than A-i, for ease of interpretability. 

In Figure |7] we observe that PNJGL outperforms FGL and GL for a suitable range of the regularization 
parameter X2, in the sense that for a fixed number of edges estimated, PNJGL identifies more true positives. 
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(1) 


Positive edges: L</ (H\®h\ > ^o} + 1{|©?,| > to}) 

True positive edges: L</ f 111®,',! > and |0i,.| > fo} + 1{|©2.| > fo and |02.|>fo}) 


(2) 


Positive perturbed columns (PPC): 

PNJGL: Iti l{||y_,i||2 > t,}- FGL/GL: iJL; 1{||(0' -©^,,,112 > f.} 
True positive perturbed columns (TPPC): 

PNJGL: l{||t>-,.,||2 > fj; FGL/GL: L'g/, 1{||(©' - ^^)-u\\2 > 
where Ip is the set of perturbed column indices. 

rUSlLlVc LO-IIUD LUlUlllIlS ^rV^V^^ 

CNJGL: Lf^il{||t/i,.,.||2 >fsand||y2,,.||2 >f,};GGL/GL: iJLi 1 { l|0i,.,l|2 > f.and ||0^ ,||2 > fsj 
True positive co-hub columns (TPCC): 

CNJGL: Le/,.l{ll^-,,,ll2 >f.vand||y2,,||2 >r,};GGL/GL: L-e4l{ll©i,-ill2 > f.and|10',,|l2 > f.,} , 
where Ic is the set of co-hub column indices. 


(3) 


Error: ^Li<j{@h ~ ®]jY + \/l.<j{®1 " ©f;)" 



Table 1: Metrics used to quantify algorithm performance. Here ©' and denote the true inverse covariance 

1 2 

matrices, and © and © denote the two estimated inverse covariance matrices. Here 1{A} is an 
indicator variable that equals one if the event A holds, and equals zero otherwise. (1) Metrics based 
on recovery of the support of ©' and ©^. Here to = 10^^. (2) Metrics based on identification of 
perturbed nodes and co-hub nodes. The metrics PPC and TPPC quantify node perturbation, and 
are applied to PNJGL, FGL, and GL. The metrics PCC and TPCC relate to co-hub detection, and 
are applied to CNJGL, GGL, and GL. We let — /u + 5.5a, where fi is the mean and a is the 

standard deviation of {||V_,,,||2}f=i (PPC or TPPC for PNJGL), {IK©' - 0^)-,,,i|2}f=i (PPC or 

TPPC for FGL/GL), {WVlijhVU and {WVlJiVU (PPC or TPPC for CNJGL), or {||0i,,,||2}f=i 

2 

and {||0_, ,||2}'Li (PPC or TPPC for GGL/GL). However, results are very insensitive to the value 
of fj, as is shown in AppendixO (3) Frobenius error of estimation of ©' and ©^. 



correctly identifies a greater ratio of perturbed nodes, and yields a lower Frobenius error in the estimates of 
©' and ©^. In particular, PNJGL performs best relative to FGL and GL when the number of samples is the 
smallest, i.e. in the high-dimensional data setting. Unlike FGL, PNJGL fully exploits the fact that differences 
between ©' and ©^ are due to node perturbation. Not surprisingly, GL performs worst among the three 
algorithms, since it does not borrow strength across the conditions in estimating ©' and ©^. 

In Figure |8] we note that CNJGL outperforms GGL and GL for a suitable range of the regularization 
parameter A,2. In particular, CNJGL outperforms GGL and GL by a larger margin when the number of 
samples is the smallest. Once again, GL performs the worst since it does not borrow strength across the two 
networks; CNJGL performs the best since it fully exploits the presence of hub nodes in the data. 

We note one interesting feature of Figure [8] the colored lines corresponding to CNJGL with very large 
values of A-2 do not extend beyond around 400 positive edges. This is because for CNJGL, a large value of 
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hi induces sparsity in the network estimates, even if A-i is small or zero. Consequently, it is not possible to 
obtain a dense estimate of 0' and 0^ if CNJGL is performed with a large value of X2. In contrast, in the case 
of PNJGL, sparsity is induced only by Xi, and not at all by A,2. We note that a similar situation occurs for the 
edge-based counterparts of CNJGL and PNJGL: when GGL is performed with a large value of A,2 then the 
network estimates are necessarily sparse, regardless of the value of A-i . But the same is not true for FGL. 

6.2 Real Data Analysis 

In this experiment, we aim to reconstruct the gene regulatory networks of two subtypes of the cancer glioblas- 
toma multiforme (GBM), as well as to identify genes that can improve our understanding of the disease. 
Cancer is caused by somatic (cancer-specific) mutations in the genes involved in various cellular processes 
including cell cycle, cell growth, and DNA repair; such mutations can lead to uncontrolled cell growth. We 
will show that PNJGL and CNJGL can be used to identify genes that play central roles in the development 
and progression of cancer PNJGL tries to identify genes whose interactions with other genes vary signifi- 
cantly between the subtypes. Such genes are likely to have many somatic mutations. CNJGL tries to identify 
genes that have interactions with many other genes in all subtypes. Such genes are likely to play an important 
role in controlling other genes' expression, and are typically called regulators. 

We applied the proposed methods to a publicly available gene expression dataset that measures mRNA 
expression levels of 1 1,861 genes in 220 tissue samples from patients with GBM (Verhaak et al. 2010| l. Each 



patient has one of four subtypes of GBM - Proneural, Neural, Classical, or Mesenchymal. We selected two 
subtypes, Proneural (53 tissue samples) and Mesenchymal (56 tissue samples), that have the largest sample 
sizes. All analyses were restricted to the corresponding set of 109 tissue samples. 

To evaluate PNJGL's ability to identify genes with somatic mutations, we focused on the following 10 



genes that have been suggested to be frequently mutated across the four GBM subtypes (Verhaak et al. 
[20T0| ): TP53, PTEN, NFl, EGFR, IDHl, P1K3R1, RBI, ERBB2, P1K3CA, PDGFRA. We then considered 
inferring the regulatory network of a set of genes that is known to be involved in a single biological process. 



based on the Reactome database (Matthews et al. 2008). In particular, we focused our analysis on the "TCR 



signaling" gene set, which contains the largest number of mutated genes. This gene set contains 34 genes, of 
which three (PTEN, P1K3R1, and P1K3CA) ai-e in the list of 10 genes suggested to be mutated in GBM. We 
applied PNJGL with <7 = 2 to the resulting 53 x 34 and 56 x 34 gene expression datasets, after standardizing 
each gene to have variance one. As can be seen in Figure |9j the pattern of network differences indicates 
that one of the three highly-mutated genes is in fact perturbed across the two GBM subtypes. The perturbed 
gene is PTEN, a tumor suppressor gene, and it is known that mutations in this gene are associated with the 



development and progression of many cancers (see e.g. ,Chalhoub and Baker 2009j). 



To evaluate the performance of CNJGL in identifying genes known to be regulators, we used a manually 
curated list of genes that have been identified as regulators in a previous study (Gentles et al. 2009); this 
list includes genes annotated as transcription factors, chromatin modifiers, or translation initiation genes. We 
then selected a gene set from Reactome, called "G2/M checkpoints," which relevant to cancer and contains a 
large number of regulators. This gene set contains 38 genes of which 15 are regulators. We applied CNJGL to 
the resulting 53 x 38 and 56 x 38 gene expression datasets, to see if the 15 regulators tend to be identified as 



co-hub genes. Figure 10 indicates that all four co-hub genes (CDC6, MCM6, CCNBl and CCNB2) detected 



by CNJGL are known to be regulators. 

7. Discussion 

We have proposed node-based learning of multiple Gaussian graphical models through the use of two con- 
vex formulations, perturbed-node joint graphical lasso and cohub node joint graphical lasso. Both of these 
formulations rely on the use of the row-column overlap norm penalty, which when applied to a matrix en- 
courages a support that can be expressed as the union of a few rows and columns. We solve the convex 
optimization problems that correspond to PNJGL and CNJGL using the ADMM algorithm, which is more 
efficient and scalable than standard interior point methods and also first-order methods such as projected sub- 
gradient. We also provide necessary and sufficient conditions on the regularization parameters in CNJGL and 
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Figure 7: Simulation results for PNJGL with q = l, FGL, and GL, for (a) n = 25, (b) n = 50, (c) n = 100, (d) 
n = 200, when p = 100. Each colored line corresponds to a fixed value of A-2, as A,i is varied. Axes 
are described in detail in Table[T] Results are averaged over 100 random generations of the data. 
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Figure 8: Simulation results for CNJGL with q = 2, GGL, and GL, for (a) n = 25, (b) n = 50, (c) n = 100, 
(d) n = 200, when p = 100. Each colored line corresponds to a fixed value of X2, as A,i is varied. 
Axes are described in detail in Table [T] Results are averaged over 100 random generations of the 
data. 
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Figure 9: Real data analysis for PNJGL with q = 2. The sample covariance matrices 5' and were generated 
from samples with two cancer subtypes, with sizes ni = 53 and 112 = 56. Only the 34 genes 
contained in the Reactome "TCR Signaling" pathway were included in this analysis. Of these 
genes, three are frequently mutated in GBM: PTEN, PIK3R1, and P1K3CA. These three genes 
correspond to the last three columns in the matrices displayed (columns 32 through 34). PNJGL 
was performed with A-i = and A-2 = 2. We display (a): the estimated matrix ; (b): the estimated 

"2 " 1 " 2 

matrix ; and (c): the difference matrix — . The gene PTEN is identified as perturbed. 
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Figure 10: Real data analysis for CNJGL with q = 2. The sample covariance matrices and were gener- 
ated from samples with two cancer subtypes, with sizes ni =53 and n2 = 56. Only the 38 genes 
contained in the Reactome "G2/M checkpoints" pathway were included in this analysis. Of these 
genes, 15 have been previously identified as regulators. These 15 genes correspond to the last 
15 columns in the matrices (columns 24 through 38). CNJGL was performed with A-i = 13 and 

1 2 

A,2 =410. We display (a): the estimated matrix ; (b): the estimated matrix . Four of the 
regulator genes are identified by CNJGL. These genes are CDC6, MCM6, CCNBl, and CCNB2. 



PNJGL so that the optimal solutions to these formulations are block diagonal, up to a permutation of the rows 
and columns. When the sufficient conditions are met, any algorithm that is applicable to these two formu- 
lations can be sped up by breaking down the optimization problem into smaller subproblems. Our proposed 
approaches lead to better performance than two alternative approaches: learning Gaussian graphical models 
under the assumption of edge perturbation (Danaher et al. 2012| l, or simply learning each model separately. 



We next discuss possible directions for future work. 

• We have focused on promoting a row-column structure in either the difference of the networks or in the 
networks themselves. However, the RCON penalty can be generalized. For instance, we might believe 
that particular gene pathways tend to be simultaneously activated across multiple distinct conditions; a 
modification of the RCON penalty can be used in this setting. 
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• Convergence of the ADMM algorithm in the presence of more than two sets of variable updates has 
only been addressed partially in the literature. However, the PNJGL and CNJGL formulations can 
be rewritten along the lines of an approach given in [Ma et al. ( 2013[ l, so that only two sets of primal 
variables are involved, so that convergence is therefore guaranteed. 

• Transcriptional regulatory networks involve tens of thousands of genes. Hence it is imperative that 
our algorithms scale up to large problem sizes. Speeding up the ADMM algorithm would prove quite 
useful (see e.g. [Goldstein et al.| [2012^ . Alternatively, other fast algorithms such as the accelerated 
proximal gradient method or second order methods can be explored. 

• In Section [5] we presented a set of conditions that allow us to break up the CNJGL and PNJGL opti- 
mization problems into many independent subproblems. However, there is a gap between the necessary 
and sufficient conditions that we presented. Making this gap tighter could potentially lead to greater 
computational improvements. 

• Tuning parameter selection in high-dimensional unsupervised settings remains an open problem. An 



existing approach such as stability selection (Meinshausen and Buhlmann 2010 1 could be applied in 
order to select the tuning parameters A-i and A,2 for CNJGL and PNJGL. 

The CNJGL and PNJGL formulations are aimed at jointly learning several high-dimensional Gaussian 
graphical models. These approaches could be modified in order to learn high-dimensional graphical 

[2012) 1. 



models involving other types of variables (see e.g. Ravikumar et al. 2010 Yang et al 



Mat lab code implementing CNJGL and PNJGL will be made available on the authors' websites. 
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Appendix A. Dual Characterization of RCON 

Lemma 11 The dual representation ofQ. is given by 

K 

= max E(^'.0'> 
h\...MeSLi''-'^p ~l 



subject to 



< ^forj^ l,2,...,p, 



where || • || denotes any norm, and \ \ • ||* its corresponding dual norm. 
Proof Recall that £2 is given by 



n(0\...,0'')= min 



subject to ©' = y + {V'Y , / = 1,2, ...,K. 



(31) 



(32) 
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LetZ = 



where Z'' ^W^p. Then ( 32 1 is equivalent to 



n(0',...,0'^) = min max Y{Z'X), 

V>:&=V'' + (V''Y ,i=\....,K Z:||Z||*<1 ^ 



(33) 



1=1 



where 



is the dual norm to 



Since in ( 33 1 the cost function is bilinear in the two sets of variables and 



the constraints are compact convex sets, by the minimax theorem, we can swap max and min to get 

n(0\...,0^) 



K 

max min 'S\{Z\V') 

Z:\\Z\\,<1 V':©'=y'"+(y'y,i=l,...,A:,^l 



(34) 



Now, note that the dual to the inner minimization problem with respect to V 



V in ( 34 1 is given by 



maximize 

Al ...,A'^ 



f (A',0') 



(35) 



subject to Z' = A' + (A'y , / = 1,2, . . .,K. 



Plugging ( 35 I into ( 34 1, the lemma follows 



By the definition of the subgradient, the subdifferential of i2 is given by the set of all jiT-tuples (A^, . . . , A^) 
that are optimal solutions to problem (31 1. Note that if (A^, . . . ,A^) is an optimal solution to (31 1, then any 
(a' +Z' , . . . +Z'^) with skew-symmetric matrices Z^, . . . ,Z*- is also an optimal solution. 



Appendix B. Proof of Theorem |4] 

The optimality conditions for the PNJGL optimization problem (|9]l with K = 2 are given by 



-«i(0')- 

-«2(0')- 



+ n2S^+Xir^-- 



-XiA 





0, 



(36) 
(37) 



where and are subgradients of ||0' || i and ||0^|| i, and (A, —A) is a subgradient of — 0^). (Note 

that — 0^) is a composition of D.^ with the linear function 0' — 0^, and apply the chain rule.) Also 

note that the right-hand side of the above equations is a zero matrix of size px p. 

Now suppose that 0' and 0^ that solve (joj) are supported on T. Then since (0^)^' , (0^)^' are supported 
on r, we have that 



n2Sjc 



-A-iT^. -A.2A7'. = 0. 



Summing the two equations in ([38]l yields 



{niS 



-n240+^i(rr+r|o = o. 



It thus follows from (R9ll that 



\niSl-c +n2Slc\\oo <Xi\\r\-c +rlc\\oo <2Xi, 



(38) 



(39) 



(40) 



where here || • ||oo indicates the maximal absolute element of a matrix, and where the second inequality in (40 1 
follows from the fact that the subgradient of the £1 norm is bounded in absolute value by one. 



We now assume, without loss of generality, that the A that solves ( [36] l and ( 37 1 is symmetric. (In fact, one 
can easily show that there exist symmetric subgradients P', F^, and A that satisfy (36 1 and (37 1.) Moreover, 
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that ||(A + A-^)y||i < 1. Therefore, ||Aj||s < 5. Using Holder's inequahty and noting 



recall from Lemma 1 1 
that Ijyll 1 = {y, sgnlyjjfoi a vector y, we obtain 

ll^rlli = 
< 

< 

< 



(Arssgn(A7'0) 
||sgn(ArOII,||ArHI. 

iri^llArHI. 



(41) 



1 1 



where the last inequality follows from the fact that \\A\\l = Lj=i W-^jW'l 5; Pi^Y^ ^i^d where in ( pTj i, ||A||^ and 
||A||,s indicate the ig and 4 norms of vec(A) respectively. 
From ( 38 1, we have for each k E {1,2} that 



< Xi||r*.||i+X2|]ArH 



(42) 



where the last inequality follows from the fact that the elements of T are bounded in absolute value by one, 
and (41 1. The theorem now follows by noting from (38 1 that for each k E {1,2}, 



nk\\Sjc 



<h\\rU 



-A.2||A7'H|oo<?ll + - 



Appendix C. Proof of Theorem |7] 



Proof The optimahty conditions for the CNJGL problem ( 10 1 are given by 

- n^. (0*) " ' + n^^*^ + r*' + A.2 A*^ = 0, k=l,... 



(43) 



where is a subgradient of ||0*^|| 1, the /T- tuple (A', . . . ,A'^) is a subgradient of £1,^(0' -diag(0'),. . . ,0^- 
diag(0'^)), and the right-hand side is a /? x p matrix of zeros. We can assume, without loss of generality, that 
the subgradients and A*^ that satisfy (43 1 are symmetric, since Lemma [ill indicates that if (A',...,A^) 
is a subgradient of £2^(0' - diag(0'), . . . ,0^^ - diag(0*)), then ((A' + (A^ )/2, • • • , (A^ + (A^)^)/2) is a 
subgradient as well. 

Now suppose that 0', . . . ,0^^ that solve ( lOl are supported on T. Since (0^)^' is supported on T for all 
k, we have 



tlkSjc 



-XiTrc 



Hi jc -f- A,2Aj-c ~ 0. 

We use the triangle inequality for the £1 norm (applied elementwise to the matrix) to get 

nA-|l4^||i <?^l|l4'-■||l+>^2||A^'■||l■ 



(44) 



(45) 



We have ||F*^||oc < 1 since F'^ is a subgradient of the £1 norm, which gives ||F^.|li < \T'\. Also A*^ is a part 
of a subgradient to £2^, so by Lemma 1 1 ||(A^' + (A^)"^) ,-|Lv < 1 for j E {1^2, . . . ,p}. Since A* is symmetric. 



we have that 11 A; 



< 



Using the same reasoning as in (41 1 of Appendix B we obtain 



\\A'T4i<:,\n^'p-^ 



Combining ( 45 1 and ( 46 1 yields 



nk\\SU\i<'>^i\T' 



(46) 



(47) 
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The theorem follows by noting from ([44| that 



<Xi||r*..|U + X2||A^j^.||oo<Xi + ^. 



(48) 



Appendix D. Proof of Theorem 10 



Assume that the sufficient condition holds. In order to prove the theorem, we must show that 
f logdet(0'^) + (0^ 5^) )) + Xi II ©'II 1 + ^2/2(0' , . . . , 0^) 

k=\ k=\ 

> f (- logdet(0*-) + {&'r,S')))+hY^\ I &'t 1 1 1 + Xz^©]^ , . . . , 0f ) . 

k=l k=l 



By assumption, 

We will now show that 

or equivalently, that 



Hk 



/^(0^...,0^)>/2(0^,...,0f). 



;0^5*)+Xl||0*■|ll >«,(0*^,5^)+xiii0^riii, 



-«;t(0^.,5'^) <A.i||0^, 



■rill- 



Note that {&jc,S'') = (0yc,5^c). By the sufficient condition, «<:||5^c||oc < Xi. So 



-ni(0^.,5*^) = -nA.(0*p.,4<. 



< ||n,5^.||oo||0*r.||i 

< ^ilie'r^lli. 



So ( |52] i holds, and hence ( |5T| i holds. 

Finally, we apply Fischer's inequality, which states that det(0'^) < det(0j), and so 

-logdet(0^) > -logdet(0^). 



Combining (50 1, (51 1, and (54 1, the theorem holds 



(49) 

(50) 
(51) 
(52) 



(53) 



(54) 



Appendix E. Connection Between RCON and Qbozinski et al. (20111 



We now show that the RCON penalty with <7 = 2 is a special case of the overlap norm of Obozinski et al 



( 201 l| l. For simplicity, here we restrict ourselves to the RCON with K — 1. The general case of A" > 1 can be 
shown via a simple extension of this argument. 

Given any symmetric p x p matrix 0, let 0a be the p x p upper-triangular matrix matrix such that 
= 0A+0A- That is. 



(0 



A)kl 



&ki ifk<l 
&kk/2 \fk = l 
iffe>/. 
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(55) 



Note that 



Now define p groups, ^i, . . . ,gp, each of which contains p variables, as displayed in Figure 
these groups overlap: \fk<l, then the (A;, /) element of a matrix is contained in both the kih and /th groups. 
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Figure 11: Depiction of groups gi,. . . for a 5 x 5 matrix. Each group's elements are shown in blue. 



The overlap norm corresponding to these groups is given by 



L liv^l 



y VPeW^p 

p 

subject to ©A — ^ V-' , supp(y^) C gj^ 

j=i 

where the relation between and 0a is as in equation ([55]). We can rewrite this as 



(56) 



nO(0) = min l^WV^l, 

yi VGMpxp 

p p 

subject to ® = 1^ V^' + ( ^ yjf, supp(y^) C gj. 



Now, define a px p matrix A such that 















if 


i<j 






if 


i>j 






if 





(57) 



P.P. 

NotethatA+A^ = + {^V^)'^ . Furthermore, ||y-'||/? = ||Aj||2, where Ay denotes the yth column of A. 

7=1 ;=i 
So we can rewrite ( [ST] ) as 



nO(0) = min ^||A,||2 

V^GMPXP 

subject to 0=A+A^. 



(58) 



This is exactly the RCON penalty with K = I. Thus, with a bit of work, we have derived the RCON from the 



overlap norm (Obozinski et al. 2011 1. The RCON penalty can be seen as an elegant variant of the overlap 



norm that naturally accommodates groups given by the rows and columns of a matrix. 

Appendix E Derivation of Updates for ADMM Algorithms 

We derive the updates for ADMM algorithm when applied to PNJGL and CNJGL formulations respectively. 
We first begin with the PNJGL formulation. 
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El Updates for ADMM Algorithm for PNJGL 

Let Lp{&\&^ ,V ,W ,F ,G,Q^ ,Q^) denote the augmented Lagi-angian (18i. In each iteration of the 

ADMM algorithm, each primal variable is updated while holding the other variables fixed. The dual variables 
are updated using a simple dual-ascent update rule. Below, we derive the update rules for the primal variables. 

F. 1 . 1 ©' UPDATE 

Note that 

0i = argmin Lp{@,@^,Z\Z^,V,W,F,G,Q\Q^) 







argmin «i(— logdet©) +p 




0-i ((02 + y + w+zi)-i(F + ei+«i5O) 

Now it follows from the definition of the Expand operator that 

0i ^ExpandQ(02+y + W+Zi)-^(ei+«i5'+F),p,«i^ . 



The update for can be derived in a similar fashion. 

F.1.2 Z^ UPDATE 

Z' 



argmin Lp{& ,@\Z,Z^ ,V ,W ,F ,G,Q\Q') 
z 



argmm j 

z 



By the definition of the soft-thresholding operator % , it follows that 

G' ^1 



z' = % @ 



P P 



The update for Z^ is similarly derived. 
F. 1.3 V UPDATE 

V = argmin Lp{&\&^,Z\z'-,X,W,F,G,Q\Q^ 



argmm 

X 



( +0 -0 -H^) 



1 



(F-G) 



By the definition of the soft-scaling operator 12, it follows that 

V = T2Q(W--W + 0'-0^) + l(F-G),| 
The update for W is easy to derive and we therefore skip it. 



F.2 Updates for ADMM Algorithm for CNJGL 



Let Lp{{@'},{Z'},{V'},{W'},{F'},{G'},{Q'}) denote the augmented Lagrangian (22 1. Below, we derive 
the update rules for the primal variables {V'}. The update rules for the other primal variables are similar to 
the derivations discussed for PNJGL, and hence we omit their derivations. 
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The update rules for V\V^ , . . . are coupled, so we derive them simultaneously. Note that 



{y'}f=i = argmin i^p ({0Un{Z'}f=i>{^'}f=i.{^'}f=p{nf=i,{G'}f=p{e'}f=i) 

Ai-diag(Ai) 



argmin X2 ^ 

A' A*^ ;=1 



A'^-diag(A'^) 



A" 

pE 

1=1 



^ ( (Wy + ©'■ -W' + - {F' - G') 
z \ p 



Let C' = UiW'f + ©'■ - W + i (F' - G')). Then the update 



y 



■2; 



C'-diag(C') 




" diag(Ci) " 




C^-diag(C^) _ 


'2pJ + 


. cliag(C^) _ 



follows by inspection. 

Appendix G. Additional Simulation Results 

Here we present more detailed results for an instance of the simulation study described in Section |6j for 
the case n = 25. Figure 12 illustrates how the PPC, TPPC, PCC and TPCC metrics are computed. As 
described in Table 1 for PNJGL, PPC is given by the number of columns of V whose £2 norms exceed 



the threshold fj. Figure 12 a) indicates that the two perturbed nodes in the data are identified as perturbed 
by PNJGL. Furthermore, given the large gap between the perturbed and non-perturbed columns, PPC is 
relatively insensitive to the choice of t^. Similar results apply to the TPPC, PCC and TPCC metrics. 



In order to generate Figure 12 PNJGL, FGL, CNJGL, GGL, and GL were performed using tuning pa- 
rameter values that led to the best identification of perturbed and cohub nodes. However, the results displayed 



in Figure 12 were quite robust to the choice of tuning parameter. 
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Figure 12: In all plots, the x-axis indexes the columns of the indicated matrix, and the y-axis displays the li 
norms of the columns of the indicated matrix, with diagonal elements removed. The sample size 
is « = 25. Perturbed nodes are indicated in red, and cohub nodes are indicated in blue, (a)-(c): 
Detection of perturbed nodes by PNJGL with q ~ 2, FGL, and GL. (d)-(i): Detection of cohub 
nodes by CNJGL with q^2, GGL, and GL. (a): PNJGL with q = 2 was performed with Xi = 2.5 
and A,2 = 12.5. (b): FGL was performed with A-i — 2.5 and A-2 = 0.75. (c): GL was performed 
with A. 1.5. (d), (g): CNJGL was performed with ^ = 2 and A-i 0.5, X2 = 37.5. (e), (h): GGL 
was performed with A-i = 0.5 and A-2 = 2.5. (f), (i): GL was performed with X = 0.75. 
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